Goto

Collaborating Authors

 isabelle hol


Beyond Gold Standards: Epistemic Ensemble of LLM Judges for Formal Mathematical Reasoning

Zhang, Lan, Valentino, Marco, Freitas, Andre

arXiv.org Artificial Intelligence

Autoformalization plays a crucial role in formal mathematical reasoning by enabling the automatic translation of natural language statements into formal languages. While recent advances using large language models (LLMs) have shown promising results, methods for automatically evaluating autoformalization remain underexplored. As one moves to more complex domains (e.g., advanced mathematics), human evaluation requires significant time and domain expertise, especially as the complexity of the underlying statements and background knowledge increases. LLM-as-a-judge presents a promising approach for automating such evaluation. However, existing methods typically employ coarse-grained and generic evaluation criteria, which limit their effectiveness for advanced formal mathematical reasoning, where quality hinges on nuanced, multi-granular dimensions. In this work, we take a step toward addressing this gap by introducing a systematic, automatic method to evaluate autoformalization tasks. The proposed method is based on an epistemically and formally grounded ensemble (EFG) of LLM judges, defined on criteria encompassing logical preservation (LP), mathematical consistency (MC), formal validity (FV), and formal quality (FQ), resulting in a transparent assessment that accounts for different contributing factors. We validate the proposed framework to serve as a proxy for autoformalization assessment within the domain of formal mathematics. Overall, our experiments demonstrate that the EFG ensemble of LLM judges is a suitable emerging proxy for evaluation, more strongly correlating with human assessments than a coarse-grained model, especially when assessing formal qualities. These findings suggest that LLM-as-judges, especially when guided by a well-defined set of atomic properties, could offer a scalable, interpretable, and reliable support for evaluating formal mathematical reasoning.


Faithful Logic Embeddings in HOL -- A recipe to have it all: deep and shallow, automated and interactive, heavy and light, proofs and counterexamples, meta and object level

Benzmüller, Christoph

arXiv.org Artificial Intelligence

Deep and shallow embeddings of non-classical logics in classical higher-order logic have been explored, implemented, and used in various automated reasoning tools in recent years. This paper presents a recipe for the simultaneous deployment of different forms of deep and shallow embeddings in classical higher-order logic, enabling not only flexible interactive and automated theorem proving and counterexample finding at meta and object level, but also automated faithfulness proofs between the logic embeddings. The approach, which is fruitful for logic education, research and application, is deliberately illustrated here using simple propositional modal logic. However, the work presented is conceptual in nature and not limited to such a simple logic context. Keywords: Logic embeddings Faithfulness Automated Reasoning 1 Motivation and Introduction Deep embeddings of logics, or more generally of domain-specific languages, in a suitable metalogic, such as the classical higher-order logic (HOL) [23,4], are typically based on explicitly introduced abstract data types that essentially axioma-tize the inductively defined character of the new language.


Formalizing Complex Mathematical Statements with LLMs: A Study on Mathematical Definitions

Zhang, Lan, Valentino, Marco, Freitas, Andre

arXiv.org Artificial Intelligence

Thanks to their linguistic capabilities, LLMs offer an opportunity to bridge the gap between informal mathematics and formal languages through autoformalization. However, it is still unclear how well LLMs generalize to sophisticated and naturally occurring mathematical statements. To address this gap, we investigate the task of autoformalizing real-world mathematical definitions -- a critical component of mathematical discourse. Specifically, we introduce two novel resources for autoformalisation, collecting definitions from Wikipedia (Def_Wiki) and arXiv papers (Def_ArXiv). We then systematically evaluate a range of LLMs, analyzing their ability to formalize definitions into Isabelle/HOL. Furthermore, we investigate strategies to enhance LLMs' performance including refinement through external feedback from Proof Assistants, and formal definition grounding, where we guide LLMs through relevant contextual elements from formal mathematical libraries. Our findings reveal that definitions present a greater challenge compared to existing benchmarks, such as miniF2F. In particular, we found that LLMs still struggle with self-correction, and aligning with relevant mathematical libraries. At the same time, structured refinement methods and definition grounding strategies yield notable improvements of up to 16% on self-correction capabilities and 43% on the reduction of undefined errors, highlighting promising directions for enhancing LLM-based autoformalization in real-world scenarios.


Formally Verified Neurosymbolic Trajectory Learning via Tensor-based Linear Temporal Logic on Finite Traces

Chevallier, Mark, Smola, Filip, Schmoetten, Richard, Fleuriot, Jacques D.

arXiv.org Artificial Intelligence

We present a novel formalisation of tensor semantics for linear temporal logic on finite traces (LTLf), with formal proofs of correctness carried out in the theorem prover Isabelle/HOL. We demonstrate that this formalisation can be integrated into a neurosymbolic learning process by defining and verifying a differentiable loss function for the LTLf constraints, and automatically generating an implementation that integrates with PyTorch. We show that, by using this loss, the process learns to satisfy pre-specified logical constraints. Our approach offers a fully rigorous framework for constrained training, eliminating many of the inherent risks of ad-hoc, manual implementations of logical aspects directly in an "unsafe" programming language such as Python, while retaining efficiency in implementation.


Formally Certified Approximate Model Counting

Tan, Yong Kiam, Yang, Jiong, Soos, Mate, Myreen, Magnus O., Meel, Kuldeep S.

arXiv.org Artificial Intelligence

Approximate model counting is the task of approximating the number of solutions to an input Boolean formula. The state-of-the-art approximate model counter for formulas in conjunctive normal form (CNF), ApproxMC, provides a scalable means of obtaining model counts with probably approximately correct (PAC)-style guarantees. Nevertheless, the validity of ApproxMC's approximation relies on a careful theoretical analysis of its randomized algorithm and the correctness of its highly optimized implementation, especially the latter's stateful interactions with an incremental CNF satisfiability solver capable of natively handling parity (XOR) constraints. We present the first certification framework for approximate model counting with formally verified guarantees on the quality of its output approximation. Our approach combines: (i) a static, once-off, formal proof of the algorithm's PAC guarantee in the Isabelle/HOL proof assistant; and (ii) dynamic, per-run, verification of ApproxMC's calls to an external CNF-XOR solver using proof certificates. We detail our general approach to establish a rigorous connection between these two parts of the verification, including our blueprint for turning the formalized, randomized algorithm into a verified proof checker, and our design of proof certificates for both ApproxMC and its internal CNF-XOR solving steps. Experimentally, we show that certificate generation adds little overhead to an approximate counter implementation, and that our certificate checker is able to fully certify $84.7\%$ of instances with generated certificates when given the same time and memory limits as the counter.


Formally Verified Approximate Policy Iteration

Schäffeler, Maximilian, Abdulaziz, Mohammad

arXiv.org Artificial Intelligence

We formally verify an algorithm for approximate policy iteration on Factored Markov Decision Processes using the interactive theorem prover Isabelle/HOL. Next, we show how the formalized algorithm can be refined to an executable, verified implementation. The implementation is evaluated on benchmark problems to show its practicability. As part of the refinement, we develop verified software to certify Linear Programming solutions. The algorithm builds on a diverse library of formalized mathematics and pushes existing methodologies for interactive theorem provers to the limits. We discuss the process of the verification project and the modifications to the algorithm needed for formal verification.


Teaching Higher-Order Logic Using Isabelle

Lund, Simon Tobias, Villadsen, Jørgen

arXiv.org Artificial Intelligence

Higher-order logic, also known as simple type theory [3], has been described as the combination of functional programming and logic [9], and has proved a very powerful tool for the formalization of mathematics and computer science. It is an expressive enough logic to cover a wide array of fields, while still being built on relatively simple principles, and a number of proof assistants based on higher-order logic are available. We consider formal reasoning in the generic proof assistant Isabelle [10, 11]. In the present paper we are taking advantage of the genericity of Isabelle, but we also find that Isabelle is at least as user-friendly and intuitive as other proof assistants of comparable power. Although Isabelle is generic and comes with a number of object logics like first-order logic (FOL) and axiomatic set theory (ZF), the default object logic is higher-order logic, called Isabelle/HOL.


An Encoding of Abstract Dialectical Frameworks into Higher-Order Logic

Martina, Antoine, Steen, Alexander

arXiv.org Artificial Intelligence

An approach for encoding abstract dialectical frameworks and their semantics into classical higher-order logic is presented. Important properties and semantic relationships are formally encoded and proven using the proof assistant Isabelle/HOL. This approach allows for the computer-assisted analysis of abstract dialectical frameworks using automated and interactive reasoning tools within a uniform logic environment. Exemplary applications include the formal analysis and verification of meta-theoretical properties, and the generation of interpretations and extensions under specific semantic constraints.


Normative Conditional Reasoning as a Fragment of HOL

Parent, Xavier, Benzmüller, Christoph

arXiv.org Artificial Intelligence

We report some results regarding the mechanization of normative (preference-based) conditional reasoning. Our focus is on Aqvist's system E for conditional obligation (and its extensions). Our mechanization is achieved via a shallow semantical embedding in Isabelle/HOL. We consider two possible uses of the framework. The first one is as a tool for meta-reasoning about the considered logic. We employ it for the automated verification of deontic correspondences (broadly conceived) and related matters, analogous to what has been previously achieved for the modal logic cube. The second use is as a tool for assessing ethical arguments. We provide a computer encoding of a well-known paradox in population ethics, Parfit's repugnant conclusion. Whether the presented encoding increases or decreases the attractiveness and persuasiveness of the repugnant conclusion is a question we would like to pass on to philosophy and ethics.


On Exams with the Isabelle Proof Assistant

Jacobsen, Frederik Krogsdal, Villadsen, Jørgen

arXiv.org Artificial Intelligence

At the Technical University of Denmark, we currently teach a MSc level course on automated reasoning using the Isabelle proof assistant [11] as our main tool. The course is a 5 ECTS optional course and the homepage is here: https://courses.compute.dtu.dk/02256/ The course is an introduction to automatic and interactive theorem proving, and Isabelle is used to formalize almost all of the concepts we introduce during the course. We have developed a number of external tools to allow us to teach basic proofs in natural deduction and sequent calculus while slowly progressing towards showing students the full power of Isabelle. The learning objectives of the course are as follows: 1. explain the basic concepts introduced in the course 2. express mathematical theorems and properties of IT systems formally 3. master the natural deduction proof system 4. relate first-order logic, higher-order logic and type theory 5. construct formal proofs in the procedural style and in the declarative style 6. use automatic and interactive computer systems for automated reasoning 7. evaluate the trustworthiness of proof assistants and related tools 8. communicate solutions to problems in a clear and precise manner We expect students to already know some logic and to be relatively proficient in functional programming before starting the course. Additionally, we expect students to have some basic knowledge of artificial intelligence algorithms for deduction. Our undergraduate program in computer science and engineering, which many of our students have completed, contains several courses that introduce students to these topics.